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Abstract 

Recently, is has been shown by Pietrzak et al. that a leakage chain rule does not hold 
in general for commonly used definition of HILL Min-Entropy. We introduce the concept 
of modulus computational entropy and use it a technical tool that allows to prove a chain 
rule for leakage. We show that the definition of modulus computational entropy is implied 
by several, sometimes seemingly unrelated assumptions, especially the ones already used in 
the literature in the context of leakage. Our results indicates that the concept of modulus 
entropy is, up to now, the weakest restriction that guarantee that the chain rule work. 



1 Introduction 



Entropy is the most fundamental concept in Information Theory. First introduced in this context 
by Shannon |Sha48| . as a measure of the uncertainty associated with a probability distribution, 
it has been generalized in many ways. The commonly used generalization of Shannon Entropy is 
Renyi Entropy, defined for any arbitrary nonnegative order, which includes Shannon Entropy as a 
special case of order 1. Informally, a reasonable entropy measure indicates for a given distribution 
how much randomness it contains. According to this intuition, distributions uniform over large 
sets should have very high entropy, in opposite to distributions which has small support or hit 
a small set with high probability being easy to predict. 

Indistinguishability and entropy. The notion of entropy has been generalized also for the 
purpose of Computational Complexity Theory and Cryptography, to take computational as- 
pects into account. The reader might wish to refer to iReyll j for a short survey. Historically 
computational entropy was first introduced in |Yao82| and, basing on a different concept, in 
[HILL99J. This last approach, based on the notion of indistinguishability, is the one we follow 
in this work. Let us try to give some intuitions here (the precisely definitions will be given in 
Section 2). To define computational entropy of X, one relaxes the requirement that X should 
have entropy itself. Instead, we assume that X is only close to a distribution Y which has suit- 
able information-theoretic entropy. To make this work we have to specify two things: (a) the 
entropy we use and (b) what does it mean "being close". We note that due to technical reasons, 
special attention is drawn usually to the case of Renyi Entropy of order oo, called min-entropy. 
Min-entropy can be simply characterized by the unpredictability property, since it is nothing 
more than just the logarithm of the probability of most likely taken value, taken with a minus 
sign. To give a rigorous formulation of (b), one uses a concept of indsistinguishability, being 
in fact the same concept as separation in Convex Analysis or Topology. Namely, we say that 
a function D separates (distinguishes) a set X from another set Y with advantage at least e if 
D{x) — D{y) ^ e for every x E X, y G Y. In turn, for a predefined class T> of functions, two sets 
are said to be (2?, e)-indistinguishable, if there is no G I? that can distinguish between these 
two sets with advantage greater than e. The smaller e and wider class T) we take, the stronger 
indistingusihability we obtain. Especially, indistinguishability applied to two probability distri- 
butions (as one-element sets) and the class of all boolean functions (as distinguishers), meaning 
acting on a probability distribution Px as taking expected value D{Px) = ^x-(-xD{x), 
yields the definition statistical distance. In applications involving computational complexity, 
one usually use circuits of bounded size as a distinguishers class. 

Leakage Lemma and a Chain Rule. Leakage lemma is the term commonly used in referring 
to various generalizations of the observation which, saying less formally, states that min-entropy 
of a distribution X conditioned on another distribution Z distributed over {0, 1}*" decreases, 
with respect to min-entropy of X, by at most m (the number of bits in the string encoding Z). 
The name comes from security-related applications, where one considers entropy of a distribution 
conditioned on information that might have been revealed to the adversary. The larger difference 
between entropy of a distribution and entropy of the corresponding conditioned distribution, 
the larger leakage is; such an approach, based on computational entropy, was used first by 
Dziembowski and Pietrzak in |DP08| . In turn, the term leakage chain rule is used to state the 
same principle for the case when we are given entropy of an already conditioned distribution 
and we are conditioning it on yet another distribution. Such further conditioning of an already 
conditioned distribution refers to so called "leakage-after-leakage" scenario. 

Although the leakage chain rule is very easy to prove in the information-theoretic framework 
for conditional min-entropy or even smooth min-entropy (in fact also for Renyi entropy of an 
arbitrary order), the problem appears in the case of computational generalizations of entropy. 
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The computational leakage lemma |DP08| IFR11| . turned out not to give rise naturally to the 
leakage chain rule at least for important indisntinguisability based definitions of conditional 
computational entropy and is addressed as an open problem |!F0R12) . A computational leakage 
chain rule was proved only for specific scenarios, or by adding strong assumptions to definitions 
( |FR11| . |CKLRlT| ) or by using slightly changed definitions (see |Reyll| for the discussion of 
computational relaxed entropy based on Leakage Lemma jGWlOj ). Recently, a counterexample 
has been shown f |SK12| ) to the chain rule for computational min entropy. 

Our contribution. Interested in establishing the possible weakest additional assumptions 
to make the leakage chain rule work for standard (defined via indisntinguishability based on 
min-entropy) computational entropy, we define the modulus computational entropy and show 
that its definition is satisfied by technical requirements which have been used by other authors 
to prove a chain rule: the decomposable entropy introduced by Fuller and Reyzin |FR11| and 
the samplability assumption used by Kai-Min Chung et al. in |CKLRll] . Furthermore, we 
investigate three cases that has not been considered yet: (a) the case where computational 
entropy is sufficiently high, (b) the existence of an NP oracle to which distinguishers are given 
access, and (c) the case when the leakage is relatively short. In all these cases our definition is 
fulfilled and the chain rule works. Summing up, while we cannot solve this problem in general, 
we solve it for a few important concrete cases and reduce these, together with already known 
solutions, to the one single concept. 

Outline of the work. Section 2 deals with some preliminary concepts, conventions and no- 
tations. In Section 3 we explain basic definitions and terminology being used in the case of 
computational entropy. In Section 3 we also discuss the cases where a chain rule is known to 
work. In Section 4 we define the modulus entropy and show that for modulus entropy the leakage 
chain rule holds. Section 5 contains a brief summary of the most important consequences of our 
results - partial solutions for the chain rule problem. Section 6 gives proofs of the conversion to 
modulus entropy for some cases, especially for the ones already having been considered in the 
literature in the context of leakage. Section 7 contains proofs of some technical results. 

2 Preliminaries 

Throughout this work we assume that all random variables are defined on some finite probability 
space and they take values in {0, 1}*. If X is a random variable then Px will be its distribution. 
When the context is clear we will sometimes slightly abuse the notation and denote Fx by X. 
Writing X G S we mean that X takes its values in the set S. By IS"! we denote the cardinality 
of S. For two random variables X,Z hy X\Z = z we denote the distribution of X conditioned 
on Z = z and {X, Z) means the concatenation of X and Z. For every n, by Un we denote 
the uniform distribution over {0,1}". By (det{0, l},s) and (det[0, we mean the class of 
all deterministic circuits of size at most s, with output in the set {0, 1} and [0, 1] respectively. 
Similarly, we denote by (randjO, 1}, s) the set of all randomized boolean circuits of size at most s. 

All logarithms are taken to the base 2. We say that function : — )■ R is a convex combination 

I 

of functions /ij : — t- M if = ^ aj/Xj for some nonnegative numbers Oi satisfying Yl'^i — ^■ 

1=1 i 

For D : Af — )• [0, 1] and k < log \X\ we denote by Max^ C A" a set of cardinality 2^ such that 
for every x G Max^ and every x' Max^ we have D{x) ^ D{x'). For D : X ^ {0, 1} we define 
\D\ = Z D{x). 

2.1 Min Entropy 

We start with recalling information-theoretic notions. 
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Definition 1 (Min Entropy). Given a random variable X we say tliat it has at least k bits of 
min-entropy and denote by Hqo (X) ^ k ii and only if maxPx (x) ^ 

The conditional min-entropy can be defined in two ways, both compatible with the above defi- 
nition. The first one is given below. 

Definition 2 (Worst-Case Conditional Min-Entropy). Given a pair of random variables {X,Z) 
we say that X conditioned on Z has min-entropy at least k and denote Hoo(^|^) ^ k, if and 
only if maxP x\z=z (x) ^ 2"'^ for every z 

It is called the worst-case because it requires X to have high min-entropy when it is conditioned 
on an event "Z = z" for every z. The alternative definition requires this fact to hold on average: 

Definition 3 (Average Conditional Min-Entropy). Given a pair of random variables {X, Z) we 
say that X conditioned on Z has average min-entropy at least k and denote Hqo {X\Z) ^ /c, if 



and only if Ez<-Z 



maxP^I^=^ (x) 



^ 2" 



Usually it is not so important which of these definitions is used, because one can convert (via a 
Markov-type argument) the average conditional min entropy to the worst case variant. 

Lemma 1 (See |DOKS08) . Lemma 2.2). Suppose thatU^o {X\Z) ^ k. Then holds Hoc {X\Z = z) ^ 
k — log J with probability at least 1 — 5 over z ^ Z . 

2.2 Indistinguishability 

Below we outline the concept of indistinguishability, being a key point in defining computational 
entropy in the next section. 

Definition 4. Let X and Y be subsets of some set V. Given a positive real number e we say 
that a function F : "P — )• [0, 1] distinguishes between X and Y with advantage at least e if 

for every x E X and y G Y we have \F{x) — F{y)\ ^ e. 

Definition 5. Let X and Y be as in Definition HI Given a class consisting of [0, l]-valued 
functions on V, we say that X and Y are (J^ , e) -indistinguishable if there is no S that can 
distinguish between X and Y with advantage greater than e. 

In this paper we are mostly interested in a special case when V is equal to the set of all probability 
distributions over some finite space 0. In this case, every function Z) : i7 — >■ [0, 1] gives rise to a 
distinguisher F^ : "P — )• [0, 1] defined as: 

FD{fi) = B^^^D{x). 

Thus, we will overload the notation and say that D distinguishes between X and Y with advantage 
at least e if the corresponding function Fd distinguishes between X and Y with advantage at 
least e. We note that D can also be a randomized function, which can be modeled by giving to 
D an additional input R chosen independently at random. In this case, the expected value in 
the definition above is taken also over the choice of R. 



3 Computational Entropy and Leakage - previous works 

As mentioned in the introduction, computational entropy can be obtained by generalizing min- 
entropy (or other notion of entropy) in many ways. We follow the approach based on indis- 
tinguishability as it seems to be the most standard way and was originally used for studying 
leakage [DPOSj as well as further leakage-related results |(]KLR11[ IFRTTI IGWIO) . 
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3.1 Defining Computational Entropy 



Three-layer definition. There are three key points, essential for defining computational en- 
tropy via indistinguishability: 

(a) specify for every k, what it means that a distribution "has (non-computational) entropy at 
least k", 

(b) model the adversary, in particular define his computational power, and determine his max- 
imal acceptable success probability and 

(c) define the measure of the "computational distance" between a given distribution and the 
set of distributions with entropy at least k (in the sense of (jaj)). 

In (jaj) one usually uses information-theoretic notion of entropy, most often the min-entrop}0. 
For (jEj) one uses a pair {T>,e) within the framework described in Section [2.21 Finally, a rigorous 
formulation of (jcj) can be given in two ways, traditionally called the "HILL" or the "Metric" 
versions. In the HILL version, while defining entropy of a random variable X, we require X 
to be indistinguishable from some distribution with high entropy (in the sense of (jaj)), whereas 
in the definition of the Metric Entropy we require X to be indistinguishable from the set of all 
of high-entropy distributions, which is a bit weaker assumption. The formal definitions below 
are provided for the conditional versions of both notions. The unconditional versions, denoted 
jjHlLL,D,e ^^-j and H^'^*"'^'-^''^ (X), are special cases of these notions obtained by fixing in the 
definitions below Z to be constant. 

Definition 6 (HILL Computational Worst-Case Conditional Entropy). Let X,Z be random 
variables taking values in {0, 1}" and {0, 1}™ respectively. Given e > 0, and a class of distin- 
guisliers D, we say that X conditioned on Z has at least k bits of computational HILL entropy 
against (P, e) and denote H™^^'^'^ (X|Z) ^ A; if there exists a random variable Y G {0, 1}" 
satisfying Hoo(^|-^) ^ k, such that {X,Z) is (D, e)-indistinguishable from (Y,Z) . 

Definition 7 (Metric Computational Worst-Case Conditional Entropy). With e,P,X and Z 
as in Def. [6l we say that X conditioned on Z has at least k bits of computational metric entropy 
against {V, e) and denote H^^*"='^'' {X\Z) ^ k if {X, Z) is (P, e) -indistinguishable from the set 
of all distributions (y, Z), satisfying Hqo ^ k. 

The definitions of the HILL Computational Avarage-Case Conditional Entropy H^^^^'-^''^ {^\Z) 
and the Metric Computational Worst-Case Conditional Entropy H^''*"'^'-^'^ (X|Z) are obtained 
by replacing Hoo(y|.^) ^ k in Def. [6] and Def. [7] (resp.) with Hqo ^ k. We note that one 

usually uses a different formulation of the definitions of Metric and HILL Entropy. 

Definition [6| H^^^^'-^'^ i^l^) ^ k if there exists a random variable Y G {0, l}" such that for 
every D eV,we have \'E(^^^^)^(^x,Z)D{x, z) - B(^^^^j^(y,z)D{x, z)\ ^ e. 

Definition [71 H^'^*"^'-^''^ (X|Z) ^ k if for every D ^ V there exists a random variable Y S 
{0,1}" such that \E(^^^^)^(^x,z)D{x, z) - B(^^^^^^(^y,z)D{x, z)\ ^ e. 

It is not hard to verify that both formulation are equivalent. However, our, more general, 
definitional approach appears to be more useful for the applications presented in the sequel. 



The equivalence between HILL and Metric-type Entropy. The Metric entropy, which 
was was introduced after the HILL one, is more convenient for proving leakage-related results. 
Both version are very close in practical applications as there exists a conversion from Metric 
Entropy (against real valued circuits) to HILL entropy |BSW03) . This result in its full generality 
can be stated as follows 

^We use only min-entropy in this work. See, however, |VZ12] for a similar definitions based on Shanon Entropy. 
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Theorem 1 (Generalization of |BSW03) . Thm. 5.2). LetV he the set of all probability measures 
over il.. Suppose that we are given a class T> of [0, l\-valued functions on Q., with the following 
property: if D ^ V then D'^ =def j _ g p pgj. (5 > 0, let V be a class consisting of all 

convex combinations of length O {^-^-^ over T>. Let C <Z V be any arbitrary convex subset of 
probability measures and X gV be a fixed distribution. Consider the following statements: 

(a) X is (D, e + 5) indistinguishable from some distribution Y £ C 

(h) X is {T>',e) indistinguishable from the set of all distribution Y £C 

Then ^ implies 

Remark 1. This result was formulated in |BSW03j in a less general form, namely $7 = {0, 1}", C 
is the set of distributions with min-entropy at least k, and D,!)' are the classes of [0, l]-valued 
circuits of size s and O (s • p-) respectively. The inspection of the proof shows that: (a) the 
chosen space il. can be an aribtrary finite set, and the number n appearing in the assertion is 
equal to log (b) the chosen set C can be replaced by an arbitrary convex set of distributions, 
(c) the complexity of the class D' is chosen only to ensure that D' contains al convex combinations 

of length O (^ ^°^JP^ ^ of elements of C. 

Remark 2. By chosing 17 = {0, 1}"+'"^ a random variable Z G {0, 1}"^ and C to be the set of all 
distributions (Y, Z) satisfying (Y, Z) : Hqo ^ k or alternatively Hqo (X\Z) ^ k, we obtain 

the conversion from Metric Conditional Entropy to HILL Conditional Entropy, for both: worst 
case and average case variants. 

3.2 Leakage Rules 

We are now ready to state the leakage chain rule for conditional min-entropy and compare it with 
its known generalizations to computational case. Generally, we are interested in the following 
problem: 

Suppose we have a pair of random variables (X, Z\ ) and we know the conditional 
entropy of X given Z\. What is the lower bound on the entropy of X given (Zi, Z2), 
where Z2 is some other (possibly correlated) random variable? 

In the information-theoretic case we have the following estimate (cf. |DQRS08j . Lemma 2.2). 

Lemma 2 (Leakage Chain Rule). Let X, Zi, Z2 be random variables over {0, 1}", {0, l}™'^ , {0, 
respectively. Then 

n^{X\Zi,Z2)^'Roo{X\Zi)-m2 (1) 

The name "Leakage Chain Rule" comes from the fact that we think of Zi and Z2 as information 
about X that "leaked" subsequently to the adversary. In the computational framework, the first 
leakage- related result appeared in [DP08| and was improved next in |FR11| . It is called Leakage 
Lemma as it deals with the case of one leakage only. 

Lemma 3 (Leakage Lemma |FR11 ]). Let X and Z he random variables over {0, l}*^ and {0, 1}™, 
resp. Then 

gMetric,[0,l],s',e' (^|^ ^ ^) ^ jj^^^t'-^^'IO'll'^'^ (X) - m 

where s' = s + 0(1) and e' = 2'"e. 

Let us observe, at least under assumption that there exists an exponentially secure pseudorandom 
generator, that both losses: in quantity (by m bits) and security measured as s/e (by factor 
almost equal to 2'") can appear simultaneousljH ; see Theorem [9] in this work. 

^In [FRllj the authors leave this problem as an open question 
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It is a natural question to ask if the Leakage Chain Rule (Lemma [2]) can be "translated" into 
the computational version. In particular, one might be tempted to conjecture that for X, Z\ 
and Z2 as in Lemma [2] it holds that 

jjMetric,[0,l],s',e' Z2) HMetric,[0,l],s,e (^X\Zx) - m2, (2) 

with security loss of factor 2™2^ where by security loss we mean p-/! (which reduces to e/e' if s' ~ 
s). Unfortunately, this conjecture is still unproven in its full generality [FORI 2) . On the positive 
side, some progress towards proving it has been recently made in |FR11| and |CKLRlT| where it 
is proven for restricted classes of entropies. In |FR11| this restriction is called decomposability. 
More precisely, their definition is as follows. 

Definition 8 f |FRll| ). Let X,Z be as in Lemma [3J We say that X has decomposable metric- 
entropy conditioned on Z at least k and denote by jj'^^*"'^"'^''^'^!'*''^ ^ if for every z 

jjMetric,[0,l],.,.(^)(^|^ = z) ^ k{z) 

where e{z) and k{z) are numbers satisfying E2^^2^'^(^) = 2^^ and E^^-z ^ £• 

Using this definition they are able to prove the following. 

Theorem 2 ( |FR11| ). Let X,Zi,Z2 be as in Lemma\^ Then for s' ^ s, and e' = 2"^^e, we 
have 

jjMetric-d,[0,l],s',.' |Zi, Z2 ) ^ gMetric-d,[0,l],.,. ) _ 

In the other approach [CKLRlT] . the authors assume the existence of an effectively samplable 
distribution with high conditional min-entropy being indistinguishable from {X, Z\ ) . The precise 
formulation of their result is given below 

Theorem 3 ( |CKLRlT| ). Let X, Zi, Zi be as above. Suppose that there exists a random variable 
Y' with the following properties: (a) Hqo {Y'\Zi) ^ k and (b) there exists a randomized circuit 
V receiving on its input z £ supp(Zi) and returning samples from Y'\Z\ = zi. Then 

jjMetric,[0,l],s',.' (^|^^^ ^ jjMetric,[0,l],s,e( ;S^| _ j^^j - log \, 



for s' = ^}{s ■ 2^3- - So) , e' ^ € + 6. 

We note that there is yet another result related to the chain rule problem, due to [GWIO] , The 
authors prove a version of [3] for a slightly different definition of Metric Conditional Min- Entropy. 
The difference is in Layer (jaj) of the definition: they require (X, Z), to be indistinguishable from 
all distribution {Y,Z') satisfying Hqo {Y\Z') ^ k, where- in comparison to Definition [6l- Z' is 
not necessarily equal to Z. As observed in |! Reyll| , one can easily generalize their approach to 
prove an "efficient" computational version of [2] for this definition, with a loss of a factor at most 
poly (2^2, e~^) in security. It seems however, that in the context of leakage Defirntion [7] is more 
suitable |(]KLR11) . 

4 Modulus Entropy 

Our definition of modulus entropy is a bit different than Definition [HI 

Definition 9 (Modulus Metric Entropy). Let X G {0, 1}" and X G {0, 1}™ be random variables. 
Given e > and a class of deterministic boolean functions P, we say that X conditioned on Z 
has modulus entropy at least k against (P, e), and denote it by jjl^'^*"^!'-^''^ {^\^) ^ ^) if for 
D & D there exists a random variable Y G {0, 1}", satisfying Hoo(5^|^) ^ k, such that 

^z^z \E^^(^x\z=z)D{x, z) - 'E'x*-{Y\z=z)D{x, z) I ^ e (3) 
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We emphasize that the above definition, being formulated for the worst-case conditional entropy, 
can be stated also for the average version which is obtained just by replacing Hqo with Hqo- By 
use of Lemma [1] we obtain inmediatelly a conversion (with some loss) between them: 

Lemma 4. Suppose that HlMetric|,o,E ^x\Z) ^ k. Then HlMetric|,©,e+5 (^x\Z) ^k-\og\ 

Some intuitions behind modulus entropy. The only difference between Definition [7] and 
Definition [9] is that they differ in order of expectation and absolute value signs. Thus, by the 
triangle inequality, the Modulus Entropy is smaller than Metric Entropy. However, they are not 
necessarily equal in general. Indeed, for D distinguishing between (X, Z) and (y, Z) with the 
advantage no greater than e, contributions to this advantage from particular values of z, given 
by the expressions eo{z) = E^^x\z=z^{^t ^) ~ ^x'^Y\z=z^{^^^) can differ in signs. Hence, 
although we have \'Eiz4^z^d{z)\ ^ e, it does not imply E^^^ |e£)(2;)| ^ e required by inequality 
([3|). In comparison to Definition [HI our approach is far more general as allow numbers e{z) as 
well as k{z) ( in the average variant) to be dependent on a chosen D. From a technical point of 
view, both definitions are formulated to control contributions from particular outcomes of Z to 
the parameter e. 



4.1 Leakage Chain Rule for Modulus Entropy 

We now show how modulus entropy allows us to prove a leakage chain rule. We start with the 
reformulation of the leakage lemma proved in |FR11| . 

Lemma 5 (Corollary from [FRll]). Let D he a boolean function and (X,Z) be as in Thm. 
O Suppose that \'Eix^xD {x) — 'Eix^yD {x)\ ^ e, where Hoo(y) ^ k. Then for any z G 
supp(Z) there exist a distrubtion with min-entropy at least k{z) = k — log p^^^-z) ■^^'-^ ^^^^ 

\^x^X\Z=zD (x) - B^^Y^D {x)\ ^ pf^z=z) - 

Now we are in position to prove the following chain rule, achieving the optimal parameters. 
Theorem 4. Let X, Z\ , Z^ be as in Thm. and D be a class of boolean functions. Suppose that 

Umetric\,V,e ^ j^/^g^ H|Metric|,2?,2™2, ^ _ 

Proof. Fix a distinguisher D = D (x, zi, Z2). We will construct a distribution (Y,Zi,Z2) such 
that iioo (Y \Zi, Z2) ^ k — m2 and D cannot distinguish (X, ^1,^2) from {Y, Zi, Z2) with 
advantage better than 

For any Z2, let (y^^,Zi) be a distribution corresponding to D(-,Z2), which existence is 
guaranteed by Definition (we use notation Y^'^ to emphasize that this distribution depends 
also on Z2). More precisely, (Y^'^, Z\) is such that 

E^i^Zl \^.x^{X\Zr=zr)^{x,Zx,Z2) - E^.^(yz2 |Zi=^i)-D(x, Zl , Z2) | ^ £ (4) 

^ V ' 

<:d(2i,22): = 

holds (cf. ([3]) in Definition [9]) . For every pair (2^1, -22) let ^ui.'^x^Z'i) denote the value within the 
first expected value sign, as indicated on (j?]). Now, Lemma [5] implies that for any z\^Z2 there 
exists a distribution Y' ^„ such that 

^x^x\iZi=z^.Z2=z2)D{x,zi,Z2) -E^^Y' D{x,zi,Z2) ^ — — ^^-^^1^^ (5) 

1' 2 Jr" IZ2 = Z2\Zii = Zi) 



and its min-entropy Hqo {Y^_^ is at least A; (zi, 2:2), where 



k (zi, Z2) ^ Hoo {Y^^ I = zi) - log -— \- (6) 

" 1^2 = ^21^1 = Zi) 
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Let {Y, Zi, Z2) be a distribution given hy {Y \Zi = zi, Z2 = Z2 ) = Y!._^ W now have 

(by©) 



~-P(Z2 = z2|Zi=zi) 

E, 



^(21,22)^(^1,^2) 



^x^X\Zi=zi,Z2=Z2 D {x, Zi,Z2) - B^^Yi^^.^D {x, Zi, Z2] 



^ Y^PiiZuZ2) = iz„Z2)) 



P {Z2 = Z2\Zi = Zi] 

^ P (Zi = Zi) CD {zi,Z2) 

21,22 



where the last inequahty follows from Q . It remains to prove that Hqo (Y \Zi, Z2) ^ k — 1712 ■ 
We have: 



P [Z2 = Z2\ Zi = Zi] 



E(,„,2)^(Zi,Z2)2-'(^^'^^) ^ B(^.,,.2)^iz„Z2) [maxP [y^^ = x\ Z, = z,] • 

= maxP ^r^^Zi = zi] ■P[Zi= zi] 

21,22 

= Ve.^^Zi [maxP =a:\Zi = zi] 

22 

^ 2™2 . 2-'' 

where the first step follows from ([6]) and the last one from Hqo iY^^ \ Zi) ^ k. □ 

Chain Rule for entropy against different circuits classes Theorem H] deals only with 
entropy against boolean deterministic distinguishers D. It is natural to ask if one could replace 
this class with a more general one, in particular, would the theorem still hold if D in its state- 
ment is equal to the class of randomized or real- valued distinguishers. We answer this question 
affirmatively in Lemma [H] below. To make its statement as strong as possible in its assumption 
we use the weakest possible option, which is the modulus entropy against boolean deterministic 
circuits, and in its assertion we use the strongest option i.e. the HILL entropyJl 

Lemma 6. Let X, Z be as in Theorem\3 Suppose that ll\Metnc\,s,e ^x\Z) ^ k. In this case we 
have that H™^^''^''^' {X\Z) ^ k' , where e' = e + 26, s' = s ■ O (7^) and /c' = A; - log i. 

Proof of LemmalE If Hl^^^t"=l'«'^ (X|Z) ^ k then, as we pointed out in the discussion af- 
ter Lemma [H we have that H^etric,det{o,i},5,e ^ ^_ Pj.qj^ Lemma 1 we obtain that 
^Metvic,det{o,i},s,e+s i^X\Z) ^ A;-log f Applying TlieoremHOl we obtain H^^'=t"^'''''[o.il'^+5(x| Z) ^ 
k — log ^ where s' = 5-1-0(1). The claim follows now from Theorem [1] □ 

Therefore, there is no meaningful loss in passing from Modulus Entropy to Metric Entropy, or 
even HILL Entropy. In the next section we consider some particular cases, where a conversion 
in the other direction is possible, up to negligible (in view of applications) loss. 



^Recall that for the HILL Entropy all kinds of circuits: deterministic boolean, deterministic real valued, 
randomized boolean are equivalent [FRll] thus we can abbreviate the notation writing just jj^^'"'"'^ {X\Z). 
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5 Passing to Modulus Entropy 



While modulus entropy, as shown in Theorem \^ solves the leakage chain rule problem, it keeps 
being rather a technical assumption, nonequivalent to commonly used definitions. We will give 
some concrete examples where its definition is fulfilled, and thus admitting the chain rule. All of 
these examples, in comparison to the assertion of Theorem |4l rely on some another assumption 
added to metric entropy of X\Z. Conversion to the modulus entropy, meaning estimating the 
loss in parameters, is summarized in the table below. 



Additional assumptions on 

jjMotric,{0,l},^,^ ^ k 



Our conversion: H 



Metric!, 



{X\Z) ^ k' 



(a) Decomposabe entropy [FRll] 



Thm. E] 



(b) Samplability oiY\Z = z given z, 
where (F, Z) (X, Z) [C KLR11| 



k-Oi\og^] 



0[e) 



O 



it) 



Thm. [7] 



(c) Entropy against poly(n)-circuits, 
given an access to an NP oracle 



k-0(\og^] 



(«') 



poly(n, i) 



Thm. E 
(point |b| 



(d) Entropy very liigh, 
i.e. k>n-0 (log ^) 



k-O (logl) 



€2 



■log I 



Thm. [5] 
(point dj) 



(e) None 



2'e 



s-0(2™-*m) Thm.E] 



Table 1: Conversions to modulus entropy 

As shown in the table, some of these assumptions have been already introduced in the literature 
to prove leakage-related results. The proofs of conversions will be given in the next section. 



6 Proofs of conversion results 

Throughout all the proofs in this section, X, Z are assumed to be random variables over 
{0, 1}", {0, 1}™' respectively. The proofs are based on the following technical lemma. 

Lemma 7. Let X, Z be arbitrary random variables over {0, 1}", {0, 1}*". Suppose that D is such 
that for all distributions (Y, Z) with Hqo {Y\Z) ^ k the following holds: 



E^^z \E^^x\z=zD (x, z) - E^^Y\z=zD (x, z)\ ^ e. 



(7) 



Then either for D' = D or for D' = we have that for all distributions (Y, Z) with Hqo (^|^) ^ 
k the following is true: 



{x,z)^{X,Z) 
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D'{x, z) - E^^Y\z=zD' (x, z) ^ 
Proof. Consider the distribution (1"^, Z) which minimizes the left-hand side of ([7|). Define 
e(^) := \^x^x\z=zD{x,z) - E^^y+\z=zD [x, z) \ . 

Observe that 



^^^^ min E^^z \E^^x\z=zD [x, z) - E^^yiz=zD (x, z) 

{Y,Z): Hoo \y\Z)~^k 



min \^x^x\z=zD [x, z) - E^^y.D (x, z) 



Therefore, for every distribution Y^ with min-entropy Hqo iXz) ^ k we have 

\^x^x\z=zD{x,z) -E^^Y\z=zD{x,z)\ ^ e{z) 
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Note that if e(z) > then either (a) 'E^^x\z=z^ ^) ~ '^x^Y\z=zD (x, z) ^ e{z) or (b) 
^x<^x\z=zD {x, z) — ^x^Y\z=zD (x, z) ^ — e(-z) holds for all with Hqo iXz) ^ k. This follows 
from the convexity of the set of distributions Hqo iXz) ^ which in turn implies that all values 
of the expression ^x->^x\z=zD (x, z) — 'Ex^y^D (x, z), over the choice of Y^, form a convex set. 
Therefore 

^x^x\z=zD' (x, z) - Bx^Y.D' [x, z) ^ e(z) 
holds for all Y^ with Hqo (^) ^ k, where D' is defined, depending on z, by 

D{x, z) in case (a) 
D'{x,z):=<. D'^{x,z) in case (b) (8) 
^ if e{z) = 0. 

Since e{z) ^ | hold^ with probability at least | over z ■(^ Z, we get 



e 



'E'x^x\z=zD' {x, z) - max Ex^y\z=zD' (x, ^ 

Yz'.Hoo{Yz}^k ^ 

with probability at least § over z Z. For every such z we obtain 



2 

e 



e 

*4 



Taking expectation over z Z we conclude that 

,2 



P(x,z)^{X,Z) 



D'(x, z) - max ED' (y^, z) ^ - 



> — 



Therefore, for either D' = D, or for D' = the probability on the left-hand side of the above 

16' 



1 z 2 

inequality needs to be at least o ' T ~ ffi' which proves the claim. □ 



6.1 Decomposable entropy 

We start with the trivial observation that Definition [8] is stronger than our Definition [9j 
Theorem 5. Suppose that HMetric-d,s,E (^x\Z) ^ k. Then HlMetric|,s,. (^x\Z) ^ k. 

Proof. Fix a distinguisher D = D{x,z). According to Definition [HI for every z we have a 
distribution Yz with min-entropy at least k{z) such that {E^^-x^D (x) — E^j^y^D (x)| ^ e{z). 

Consider a distribution (Y, Z) defined by {Y\Z = z) =1^. Since Ez^z^{z) ^ e, we obtain 
inequality ([3]). In turn, the assumptions on k{z) implies Hqo {Y\Z) ^ k. □ 

The following theorem converts Metric Entropy into Modulus Entropy (cf. case (e) in Table [5]). 
Its principal significance is that the equivalence between both definitions is established, provided 
that Z is sufficiently short (grows at most logarithmically in the security parameters). 

Theorem 6. Suppose that Il^''^"'''^°'^^'''%X\Z) ^ k. T/ien hI^^*"'=I''^''^' (X|Z) ^ k, where 
e' = 2*e ands' = s-0 (2'"-*m) . 

Proof. For the sake of contradiction suppose that for some D of complexity s' and for every 
(y, Z) such that Hoo(y|-Z^) ^ k we have that 

^z^z \Bx^x\z=zD (x, z) - Ex^Y\z=zD (x, z)\ ^ e. 



^Throughout the proofs, we will make use of the simple Markov-style principle: let X be a non-negative 
random variable bounded by M. Then X > i^EX with probability at least |EX. 
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Aapplying the same reasoning as at the beginning of the proof of Lemma [71 we obtain that there 
exist a distinguisher D' (cf. ([8])) such that for every with Hqo iXz) ^ k it holds that 

B^^x\z=zD' (x, z) - B.^Y.D' (x, z) ^ e'(z), (9) 

where E^^^e'(2;) ^ e'. Thus, for every distribution {Y^ Z) with entropy Hqo {Y\Z) ^ k we have 

^{x,z)^{x,z)D' (x, z) - E(3.^^)^(y^^)L»' (x, z) ^ Ez^ze'iz) ^ e'. 

Recah that in the proof of Thm. [3 the value D'{x, z) is defined as equal to D{x, z) or D'^{x, z) or 
0, depending on z. Instead, we can follow that construction with respect to only 2™"* "heaviest" 
values z maximizing P(Z = z)e' {z) and setting = for other z. The obtained circuit is of 
size at most s' + O {T^'^rnj = s and distinguishes with the advantage at least 2~*e' = e. □ 

6.2 The samplability assumption 

In the next theorem we deal with the samplability assumption used in [CKLRlT] . 

Theorem 7. Suppose that {X,Z) is (s , e) -indistinguishable from a distribution (Y',Z), with the 
following properties (a) Hqo {Y'\Z) ^ k and (b) there exists a randomized circuit T receiving on 
its input z £ supp(Z) and returning samples from the distribution Y'\Z = z. Then 

jj|Metric|,s.|i-size(r),8v^ (^^|^^ ^ it - 2 log (^-^ - 7. 

Proof Suppose that Hl^^^t"'^!'*''^' {X\Z) < k' , where k' = k - 2log (i) - 7 and e' = |J and 

2 

s' = ^ — size(r). Thus, for some D of size s' and every {Y, Z) with Hoo(^|-^) ^ k' we have 

B,^z \'E^^x\z=zD (x, z) - B^^Y\z=zD (x, z)\^e'. (10) 

Let D' be a distinguisher obtained from Lemma [71 Consider the following distingisher D": on 
input {x,z), which comes either from {X,Z) or (Y' , Z) do the following: 

• for i = 1 to £ = 1"^] — 1 sample yi <— Y'\Z = z using the circuit F, 

• if D'{x,z) > max D' {yi,z) — output 1, otherwise output 0. 

i=l,...,l 

Clearly D" has complexity at most (/ + l)-(s' + size(r)) = s. We will show that it gives sufficient 
distingishing advantage. We start with the following easy observation, used implicitly already 
in [CKLRlT] (the proof of Lemma 16). 

Lemma 8. For D be a [0,l]'Valued function. IfY^ is distributed uniformly over Max^, then 
for any Y with Hoo(^) 1? k + log | we have 

P,^y [D{x) - B,^Y+D{x) > 0] < (5. 

The proof that D" is indeed a good distinguisher consists of two steps 

Claim 6.1. On input {x,z) ^ {X,Z) the circuit D" outputs 1 with probability at least e'^/32. 

Proof. Consider a distribution {Y^,Z) such that for every z the distribution Y^\ Z = z is 
uniform over Max^^ ..-j. Since yi are independent and distributed according to Y'\Z = z, it 
follows from Lemma [HI that E^j^y+i^^^-^' (^) ^ max L''(yj, z) holds with probability at least 

i 

Now, Lemma [71 yields D'{x, z) > E^^y+^z=z^' (^) with probability at least over (x, z). Since 
sampling y^ is independent from (X,Z), the claim follows. □ 
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Claim 6.2. On input {y,z) ^ {Y',Z) the circuit D" outputs 1 with probability at most e'^/64. 

Proof. Note that y as well as the samples yi, . . . ,yi are all independent copies of the same 
distributions Y'\Z = z. Therefore probability that y > max^yj is at most ^ ll- I— ' 

From the last two claims we get the inequality P {D" {X, Z) = 1) - P {D" {Y, Z) = 1) ^ ^e'^, 
which completes the proof of Theorem [71 □ 



6.3 Approximate counting 

It turns out that using a technique called the approximate counting, one can show a conversion 
from metric to modulus entropy. However, we need some additional assumptions to achieve 
both: high accuracy and efficiency in the approximate counting: 

Theorem 8. Suppose that one of the following is true: 

(a) H'^etric,rand{o,i},^,e(x|Z) ^ k agamst circuits of size s, 



(b) H^'^*"'^''f°'^J''*'^(X|Z) ^ k against circuits of size s 
oracle. 



poly(n), with an access to an NP- 



Then we have 'il\^^^"c\,s',e' (^X\Z) ^ k' , where e' = 8y/e, k' = k — log ^ and s' given by s' 

2 — — 2 . 



O s 



log{l/e) 



m case \w or 



s' = poly (n, e) in case 



Note that to make the conversion in (a) efficient, we need the assumption that k is large as it 
is easy to see that if k is much smaller than n then, in the formula that gives the bound on s' , 
the 2*^~"'~^ factor starts to dominate over e. 

Proof of Theorem\^ Suppose that jj''^'^*''"^!'* '"^ {^\^) < Then Lemma [7] implies that for all 
Y G {0, 1}" with Hoo(^l^) ^ k' and some distinguisher D' of complexity s' + 1 we have 



{x,z)^{X,Z) 



D'{x, z) - BD' {Y\Z = z, z) ^ - 
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Since 



max 

n:Hoo(y.)^fc' 



ED (y^, z) = min (l, 2"*=' \D'{- 

k' V 

hence, combining it with (|lip . we obtain 

\D'{;z)\ 



{x,z)^{X,Z) 



D'(x,z) 



2k' 



.'2 
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:ii) 



(12) 



(13) 



We now show that there exists a function h such that for every z the random variable h{z) 
satisfies 



2k' 



(14) 



and h{z) is samplable for all z's satisfying |D'(-,z)| < 2^'. More precisely: there there exists a 

s, which computes h{z) correctly for every such 



randomized circuit of size O \ s' ■ 



J 2" 



z. This is a corollary from following claim. 



Claim 6.3. Let D be a boolean circuit such that \D\ ^ 2^. Then for 6', 6" G (O, i > 

"'~^yjlog jTT and {Ui)^^i £ being independent and uniform, the following inequality holds: 



4-2 



1 ^ 

Y,Dm-2--\D\ 



i=l 



^ 26". 
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„/ a»,„0 Define a rando. vaHaUe . ^ 1 E « ,^,). T.e Che.off I„e..ali.i .ieM. 



i=l 



P [|5 - ED{U)\ '^S]^ 2 max e"^, e 



where = Var (^T. D (Ui)^ . Since Var {D (Ui)) = 2^-" (l - 2^-") we have ct^ = ^-2^-" (l - 2^=-") . 



By setting 2"-^5 = 6' we get ^ ^ ^ 
that 2^-'^^5'2 > 4 log ^) we obtain P [\g ■ 2 



and ^ ^ ^ — Choosing i sufficiently large (so 



4 """"" 2 



< 25". 



□ 



Defining h{z) 



D{Ui,z), we obtain a required sampler for h{z). Consider the follow- 



i=l 



ing distinguisher D": on input {x,z), which comes either from {X,Z) or {Y,Z), return 1 iff 
D'{x,z) > h{z) + |. We will prove that D" distinguishes between (X, Z) and all {Y, Z) satis- 
fying Hoc {Y\Z) ^ k. Note that if D"{x,z) = 1 then h{z) < 1 - | and hence \D' {■,z)\ < 2^' . 
Especially, D" is of complexity at most s. Now, inequalities (jl4p and p3|) yield 



L>'(x,z) > + - 



(x,z)'^{X,Z) 



£1 > ^ i_ 

64 ^ 4 ' 16' 



Choosing k' = k + log i where (5 = |x , using (jl2p , (jl3p , (jl4p and Lemma [HI we obtain 



(x,z)'^(Y,Z) 



D'{x,z) > h{z) + 



D'{x,z) > 



\D'{;z) 



2k' 



^'2 



^'2 



64 2 16 



Combining the last two estimates yields, if only Hoo(^|-^) ^ k' , the inequality 

P [D'{X, Z) = 1] - P [D'{Y, Z) = 1] ^ ^ 

which completes the proof for case (jaj) . In case (jb]) , we proceed in the same way but we compute 
numbers h{z) using an NP oracle. The basic result we use can be stated as follows: 

Lemma 9. fOG09^ There is a probabilistic algorithm which, given a boolean circuit D over 
{0, 1}" of size poly(n) and a natural number M, decides, with succes probability at least |, 
whether jM < \D\ < AM , in time poly (n), using an oracle for NP. 

Let us make three important observations: 

• The success probability | can be amplified to 1 — 5, by repating the algorithm O (log |) 
times and taking the majority answer. 

• The factor 4 can be improved to 1 + 7, by running the algorithm on the circuit D' = 
Di A ... A Dfc, where Di for i = 1, . . . ,k are copies of D and k is such that (1 + j)^ ^ 4. 



^We use the following version: Let Xi be a random variables satisfying \Xi — EXi\ ^ 1 and X = ^ Xi. Then 

i 

P [\X - EX\ ^ Act] < 2rnin ( e"^, c"^ ), where a = Var(X) 
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Hence, there is an algorithm which, with probabihty at least 1 — S, computes a value g such 
that (1 — 7)M < \D\ < (1 + 7)M, in time poly (^n, ^, log , using an oracle for NP. For every 

z, let M{z) be a value obtained by applying this algorithm to the circuit D'{-,z) and 7 = fg, 

6 = 1-^. Define h{z) := 2-^M{z). If \D'{-,z)\ < 2^', then \M{z) - \D'{-,z)\\ ^ 2 • 2^=' • 
holds with probability at least 1 — , and thus for such values z holds the same estimate as in 
(fl^ . We proceed further with h as in the previous proof. □ 

7 Some technical results 

Lemma 10. Let X E {0, 1}"' be a random variable, f : {0, 1}*" — )• {0, 1}" be a deterministic 
function computable by a circuit of size s, and e satisfy {) < e < j^- Then 

jjMetric,det{0,l},s,e < 3 

Proof. Consider the following distinguisher D: on the input {y,x), where x E {0,1}'" and 
y E {0, 1}", run f{x) and return 1 iff f{x) = y. Then for every x we get D{f{x),x) = 1. Let Y 
be any random variable over {0, 1}" such that Hoo(5^|-'^) ^ 3. Then by Lemma [1] we obtain 

Hoo(y|X = :E) ^3-log2(3) 

with probability | over x <— X. Since D{y, x) = ii y x, for any such x we have 

E,^y|X=xl?(?/,x)^2-(3~l°S2(3))^3^ 

and thus, with probability | over x <^ X, 

=1 

' " ^ 5 

^y^f{x)\x=xD {y, x) -'E,y^Y\x=xD (y, x) > - 

Taking the expectation over x X we obtain finally 

2 5 1 1 
^y,x^fix),xD (y, x) - By^^^Y,xD ^) ^ 3 ' g " 3 ' 1 = Y2 ' 

□ 

We use the lemma above to show that the esimate in Lemma |3] cannot be improved: 

Theorem 9 (Tightness of the estimate in Lemma [3]) . Suppose that there exists an exponentially 
secure pseudorandom generator f . Then for every m and C > we have 

jjHILL,and{0,l},2-("0,-^ (/ (C/^)) ^m + C 

and simultaneously, 

gMetric,det{0,l},polyM,^^ (/(t/^)! [7^) ^ 3 

Proof The first inequality follows directly from the definition of the exponentially secure pseu- 
dorandom generator. The second inequality is implied by Lemma [TOl □ 

Below we prove the equivalence between boolean and real valued distinguishers 

Theorem 10. For any random variables X, Z over {0, 1}", {0, 1}™" we have 

jjMetric,det[0,l],s',e^j^| jjMetric,det{0,l},s,e^j^|^^ 

where s' ^ s. 
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Proof. We only need to prove u^^i''^,det[o,i],s',e ^x\Z) ^ jj^''*""''^*'*^"'^^'''" as the other di- 
rection is trivial (because the class (det[0, l],s) is larger than (det{0, 1}, s)). Suppose that 
jjMetric,det[o,i],s,e i^x\Z) < k. Then for some D and all Y satisfying Hoo {X\Z) ^ k we have 

\^ix,z)^{x,z)D{x,z) - E(^_2)^(y^^)D(x,2;)| ^ e 

Applying the same reasoning as in Thm. |6]we can replace D with D' which is equal either to 
D or D^, obtaining, for all distributions Hqo iY\Z) ^ k, the following: 

ED'{X, Z) - ED'(Y, Z) ^ e. 

Consider the distribution {Y~^ , Z) minimizing the left side of the above inequality. Equivalently, 
it maximizes the expected value of D' under the condition Hqo (^I-^) ^ k. Since this condition 
means that Hqo (^"''I Z = z) k for all z, we conclude that Y~^\ Z = z, for fixed z, is distributed 
over 2^ values of x giving the greatest values of D'{x, z). Calculating the expected values in the 
last inequality via integration of the tail yields 

j ^{x,z)^(X,Z) [D{x, z)>t\dt- j P(^,^)^(y+,z) [D{x, z)>t\At^e 
te[o,i] t6[o,i] 

therefore for some number t G (0, 1), the following holds: 

^{x,z)^{x,z) [D{x, z)>t]^ F{x,z)^(Y+,z) [D{x, z)>t] + e. 

Let D" be a {0, l}-distinguisher that for every (x, z) outputs 1 iff D{x, z) > t. Clearly D" is of 
size 5 + 0(1) and satisfies 

'^{x,z)^-(x,z)D"{x,z) ^ E(^_^)^(y+_2)i:»"(x,z) + e. 

We assumed that {Y,Z) maximizes 'ED' (Y, Z). Now we argue that {Y, Z) is also maximal for 
D" . We know that for every z the distribution is flat over the set Max^,^ of 2*^ values 

of X corresponding to largest values of D'{x,z). It is easy to see that Max^/^-. ^-j = Mslx^„^_ 
Therefore, we have shown in fact that 

^{x,z)^ix,z)D"ix,z) - ^^^^^^ max ^^^^^E(,,,)^(y,^)D"(x,z) ^ e, 
which means exactly that h'^^*"^'^^'^}'"''^ {X\Z) < k. □ 
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